Inferring Speciation Tunes under an Episodic Molecular Clock

نویسندگان

  • BRUCE RANNALA
  • ZIHENG YANG
چکیده

— We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms. [Bayesian method; divergence times; MCMC; molecular dock.] The molecular clock hypothesis postulates that the molecular evolutionary rate is constant over time (Zuckerkandl and Pauling, 1965) and provides a simple indirect means for dating evolutionary events. The expected genetic distance between sequences increases linearly as a function of the time elapsed since their divergence and fossil-based divergence dates can therefore be used to translate genetic distances into geological times, allowing divergence times to be inferred for species with no recent ancestor in the fossil record. The molecular clock hypothesis is often violated, however, particularly when distantly related species are compared, and such violations can lead to grossly incorrect species divergence time estimates (Bromham et al., 1998; Yoder and Yang, 2000; Adkins et al., 2003). One approach to dealing with a violation of the clock is to remove sequences so that the clock approximately holds for the remaining sequence data. This may be useful if only one or two lineages have grossly different rates and can be identified and removed (Takezaki et al., 1995) but is difficult to use if the rate variation is more widespread. A more promising approach is to take explicit account of among-lineage rate variation when estimating divergence times. Variable-rates models have been the focus of much recent research, with both likelihood and Bayesian methodologies employed. In a likelihood analysis, prespecified lineages in the phylogeny are assigned independent rate parameters, estimated from the data (Kishino and Hasegawa, 1990; Rambaut and Bromham, 1998; Yoder and Yang, 2000). Recent extensions to the likelihood method (Yang and Yoder, 2003) allow the use of multiple calibration points and simultaneous analysis of data for multiple genes while accounting for their differences in substitution rates and in other aspects of the evolutionary process. The Bayesian approach, pioneered by Thome et al. (1998) and Kishino et al. (2001; see also Huelsenbeck et al., 2000; Drummond et al., 2006), uses a stochastic model of evolutionary rate change to specify the prior distribution of rates and, with a prior for divergence times, calculates the posterior distributions of times and rates. Markov chain Monte Carlo (MCMC) is used to make the computation feasible. Such methods build on the suggestion by Gillespie (1984) that the rate of evolution may itself evolve over time and may be considered as more rigorous implementations of Sanderson's rate-smoothing procedure (Sanderson, 1997; Yang, 2004). The algorithm was extended to analyze multiple genes (Thorne and Kishino, 2002). The method has been applied successfully to estimate divergence times in a number of important species groups, such as the mammals (Hasegawa et al., 2003; Springer et al, 2003), the birds (Pereira and Baker, 2006), and plants (Bell and Donoghue, 2005). Thorne et al. (1998) used lower and upper bounds for node ages to incorporate fossil calibration information. With this prior, divergence times outside the bounds are impossible in the posterior, whatever the data. Biologists may often lack sufficiently strong convictions to apply such "hard bounds" and, in particular, fossils often provide good lower bounds (minimal node ages) but not good upper bounds (maximal node ages). However, the posterior can be sensitive to changes to the upper bounds. This observation prompted Yang and Rannala (2006) to implement arbitrary prior distributions for the age at a fossil calibration node. Such "soft bounds" may sometimes provide a more accurate description of uncertainties in fossil ages. Our implementation, however, assumed the molecular clock. 453 at Penylvania State U niersity on Feruary 7, 2013 http://sysbfordjournals.org/ D ow nladed from 454 SYSTEMATIC BIOLOGY VOL. 56 In this paper, we extend our previous model to relax the clock assumption. We implement two prior models that allow the evolutionary rate to vary over time or across lineages. The first assumes a geometric Brownian motion process of rate drift over time, the model implemented by Thorne et al. (1998) and Kishino et al. (2001). Rates are autocorrelated between ancestral and descendant lineages on the tree. The second is an independent-rates model, with no autocorrelation. Instead, branch-specific rates are independent variables drawn from a common distribution. If higher rates tend to change more than lower rates, there may be little autocorrelation of rates between ancestral and descendant lineages (Gillespie, 1984). In such cases, independentrates models may be more flexible in accommodating large rate shifts that may occur during species radiations due to rapid range expansions, increasing effective population sizes, enhanced selection, etc. We also develop an "infinite-sites theory" to understand the limit of divergence time estimation; even when the number of sites in the sequence approaches infinity, the errors in posterior time estimates will not approach zero because of inherent uncertainties in fossil calibrations and the confounding effect of rates and times in the sequence data. Yang and Rannala (2006) studied the analytical properties of this problem when the data consist of one locus and the molecular clock is assumed. Here, the theory is extended to the general case of variable rates and multiple loci. We use computer simulation to assess the information content of sequence data from multiple loci for estimation of divergence times (in the limit of infinite sequence length). We also analyze two empirical data sets for comparison with previous methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inferring speciation times under an episodic molecular clock.

We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergenc...

متن کامل

Bayesian models of episodic evolution support a late precambrian explosive diversification of the Metazoa.

Multicellular animals, or Metazoa, appear in the fossil records between 575 and 509 million years ago (MYA). At odds with paleontological evidence, molecular estimates of basal metazoan divergences have been consistently older than 700 MYA. However, those date estimates were based on the molecular clock hypothesis, which is almost always violated. To relax this hypothesis, we have implemented a...

متن کامل

The hepatic circadian clock fine-tunes the lipogenic response to feeding through RORα/γ.

Liver lipid metabolism is under intricate temporal control by both the circadian clock and feeding. The interplay between these two mechanisms is not clear. Here we show that liver-specific depletion of nuclear receptors RORα and RORγ, key components of the molecular circadian clock, up-regulate expression of lipogenic genes only under fed conditions at Zeitgeber time 22 (ZT22) but not under fa...

متن کامل

Distances and directions in multidimensional shape spaces: implications for morphometric applications.

Aris-Brosou, S., and Z. Yang. 2002. Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Syst. Biol. 51:703–714. Aris-Brosou, S., and Z. Yang. 2003. Bayesian models of episodic evolution support a late precambrian explosive diversification of the metazoa. Mol. Biol. Evol. 20:1947–1954. Ayala, F. J., A. Rzhetsky...

متن کامل

Is Speciation Accompanied by Rapid Evolution? Insights from Comparing Reproductive and Nonreproductive Transcriptomes in Drosophila

The tempo and mode of evolutionary change during speciation have remained contentious until recently. While much of the evidence claiming speciation is an abrupt and rapid process comes from fossil data, recent molecular phylogenetics show that the background of gradual evolution is often broken by accelerated rates of molecular evolution during speciation. However, what kinds of genes affect o...

متن کامل

New analytic results for speciation times in neutral models.

In this paper, we investigate the standard Yule model, and a recently studied model of speciation and extinction, the "critical branching process." We develop an analytic way-as opposed to the common simulation approach-for calculating the speciation times in a reconstructed phylogenetic tree. Simple expressions for the density and the moments of the speciation times are obtained. Methods for d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007